Deep Neural Networks (DNNs) suffer from domain shift when the test dataset follows a distribution different from the training dataset. Domain generalization aims to tackle this issue by learning a model that can generalize to unseen domains. In this paper, we propose a new approach that aims to explicitly remove domain-specific features for domain generalization. Following this approach, we propose a novel framework called Learning and Removing Domain-specific features for Generalization (LRDG) that learns a domain-invariant model by tactically removing domain-specific features from the input images. Specifically, we design a classifier to effectively learn the domain-specific features for each source domain, respectively. We then develop an encoder-decoder network to map each input image into a new image space where the learned domain-specific features are removed. With the images output by the encoder-decoder network, another classifier is designed to learn the domain-invariant features to conduct image classification. Extensive experiments demonstrate that our framework achieves superior performance compared with state-of-the-art methods.
translated by 谷歌翻译
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions and speaking speed presented in the video. Unlike previous works, we propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modelling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene. Specifically, we align lip movement to the speech duration, and convey facial expression to speech energy and pitch via attention mechanism based on valence and arousal representations inspired by recent psychology findings. Moreover, we design an emotion booster to capture the atmosphere from global video scenes. All these embeddings together are used to generate mel-spectrogram and then convert to speech waves via existing vocoder. Extensive experimental results on the Chem and V2C benchmark datasets demonstrate the favorable performance of the proposed method. The source code and trained models will be released to the public.
translated by 谷歌翻译
With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
translated by 谷歌翻译
Humans can balance very well during walking, even when perturbed. But it seems difficult to achieve robust walking for bipedal robots. Here we describe the simplest balance controller that leads to robust walking for a linear inverted pendulum (LIP) model. The main idea is to use a linear function of the body velocity to determine the next foot placement, which we call linear foot placement control (LFPC). By using the Poincar\'e map, a balance criterion is derived, which shows that LFPC is stable when the velocity-feedback coefficient is located in a certain range. And that range is much bigger when stepping faster, which indicates "faster stepping, easier to balance". We show that various gaits can be generated by adjusting the controller parameters in LFPC. Particularly, a dead-beat controller is discovered that can lead to steady-state walking in just one step. The effectiveness of LFPC is verified through Matlab simulation as well as V-REP simulation for both 2D and 3D walking. The main feature of LFPC is its simplicity and inherent robustness, which may help us understand the essence of how to maintain balance in dynamic walking.
translated by 谷歌翻译
本文提出了一种有效且安全的方法,可以避免基于LiDAR的静态和动态障碍。首先,点云用于生成实时的本地网格映射以进行障碍物检测。然后,障碍物由DBSCAN算法聚集,并用最小边界椭圆(MBE)包围。此外,进行数据关联是为了使每个MBE与当前帧中的障碍匹配。考虑到MBE作为观察,Kalman滤波器(KF)用于估计和预测障碍物的运动状态。通过这种方式,可以将远期时间域中每个障碍物的轨迹作为一组椭圆化。由于MBE的不确定性,参数化椭圆形的半肢和半尺寸轴被扩展以确保安全性。我们扩展了传统的控制屏障功能(CBF),并提出动态控制屏障功能(D-CBF)。我们将D-CBF与模型预测控制(MPC)结合起来,以实施安全至关重要的动态障碍。进行了模拟和实际场景中的实验,以验证我们算法的有效性。源代码发布以供社区参考。
translated by 谷歌翻译
在动态环境中,持续增强学习(CRL)的关键挑战是,随着环境在其生命周期的变化,同时最大程度地减少对学习的信息的灾难性忘记,随着环境在其一生中的变化而变化。为了应对这一挑战,在本文中,我们提出了Dacorl,即动态自动持续RL。 Dacorl使用渐进式上下文化学习了上下文条件条件的策略,该策略会逐步将动态环境中的一系列固定任务群集成一系列上下文,并选择一个可扩展的多头神经网络以近似策略。具体来说,我们定义了一组具有类似动力学的任务,并将上下文推理形式化为在线贝叶斯无限高斯混合物集群的过程,这些过程是在环境特征上,诉诸在线贝叶斯推断,以推断上下文的后端分布。在以前的中国餐厅流程的假设下,该技术可以将当前任务准确地分类为先前看到的上下文,或者根据需要实例化新的上下文,而无需依靠任何外部指标来提前向环境变化发出信号。此外,我们采用了可扩展的多头神经网络,其输出层与新实例化的上下文同步扩展,以及一个知识蒸馏正规化项来保留学习任务的性能。作为一个可以与各种深度RL算法结合使用的一般框架,Dacorl在稳定性,整体性能和概括能力方面具有一致的优势,而不是现有方法,这是通过对几种机器人导航和Mujoco Socomotion任务进行的广泛实验来验证的。
translated by 谷歌翻译
流行预测是有效控制流行病的关键,并帮助世界缓解威胁公共卫生的危机。为了更好地了解流行病的传播和演变,我们提出了Epignn,这是一种基于图神经网络的流行病预测模型。具体而言,我们设计了一个传输风险编码模块,以表征区域在流行过程中的局部和全局空间效应,并将其纳入模型。同时,我们开发了一个区域感知的图形学习者(RAGL),该图形将传播风险,地理依赖性和时间信息考虑在内,以更好地探索时空依赖性,并使地区意识到相关地区的流行状况。 RAGL还可以与外部资源(例如人类流动性)相结合,以进一步提高预测性能。对五个现实世界流行有关的数据集(包括流感和Covid-19)进行的全面实验证明了我们提出的方法的有效性,并表明Epignn在RMSE中优于最先进的基线。
translated by 谷歌翻译
作为反对攻击的最有效的防御方法之一,对抗性训练倾向于学习包容性的决策边界,以提高深度学习模型的鲁棒性。但是,由于沿对抗方向的边缘的大幅度和不必要的增加,对抗性训练会在自然实例和对抗性示例之间引起严重的交叉,这不利于平衡稳健性和自然准确性之间的权衡。在本文中,我们提出了一种新颖的对抗训练计划,以在稳健性和自然准确性之间进行更好的权衡。它旨在学习一个中度包容的决策边界,这意味着决策边界下的自然示例的边缘是中等的。我们称此方案为中等边缘的对抗训练(MMAT),该方案生成更细粒度的对抗示例以减轻交叉问题。我们还利用了经过良好培训的教师模型的逻辑来指导我们的模型学习。最后,MMAT在Black-Box和White-Box攻击下都可以实现高自然的精度和鲁棒性。例如,在SVHN上,实现了最新的鲁棒性和自然精度。
translated by 谷歌翻译
FreeSpace检测是自动驾驶技术的重要组成部分,并且在轨迹计划中起着重要作用。在过去的十年中,已证明基于深度学习的自由空间检测方法可行。但是,这些努力集中在城市道路环境上,由于缺乏越野基准,很少有针对越野自由空间检测专门设计的深度学习方法。在本文中,我们介绍了ORFD数据集,据我们所知,该数据集是第一个越野自由空间检测数据集。数据集收集在不同的场景(林地,农田,草地和乡村),不同的天气条件(阳光,多雨,雾气和雪地)以及不同的光线条件(明亮的光线,日光,暮光,黑暗)中,完全包含12,198 LIDAR点云和RGB图像对与可穿越的区域,不可传输区域和无法达到的区域进行了详细注释。我们提出了一个名为Off-NET的新型网络,该网络将变压器体系结构统一以汇总本地和全球信息,以满足大型接收领域的自由空间检测任务的要求。我们还向动态融合激光雷达和RGB图像信息提出了交叉注意,以进行准确的越野自由空间检测。数据集和代码可公开可用athttps://github.com/chaytonmin/off-net。
translated by 谷歌翻译
基于面具的预训练在没有手动注释的监督的情况下,在图像,视频和语言中进行自我监督的学习取得了巨大的成功。但是,作为信息冗余数据,尚未在3D对象检测的字段中进行研究。由于3D对象检测中的点云是大规模的,因此无法重建输入点云。在本文中,我们提出了一个蒙版素分类网络,用于预训练大规模点云。我们的关键思想是将点云分为体素表示,并分类体素是否包含点云。这种简单的策略使网络是对物体形状的体素意识,从而改善了3D对象检测的性能。广泛的实验显示了我们在三个流行数据集(Kitti,Waymo和Nuscenes)上使用3D对象检测器(第二,Centerpoint和PV-RCNN)的预训练模型的效果。代码可在https://github.com/chaytonmin/voxel-mae上公开获得。
translated by 谷歌翻译